Journal: bioRxiv
Article Title: OctopusV and TentacleSV: a one-stop toolkit for multi-sample, cross-platform structural variant comparison and analysis
doi: 10.1101/2025.03.24.645012
Figure Lengend Snippet: OctopusV is organized into three primary modules: the Input layer, the Correct module, and the Merge module, along with supporting functionality modules. The Input layer processes SV data from multiple sequencing platforms (NGS, PacBio, ONT, and others) and handles VCF files from standard SV callers. Several common callers are presented as examples; however, OctopusV can work with any VCF-formatted SV calls regardless of the caller or sequencing technology. The Correct module processes VCF files through sequential steps: BND extraction, pattern recognition of breakpoint orientations (shown by N[…] and […]N patterns), and SV type classification into standard forms (duplication, inversion, intra/inter-chromosomal TRA). The Merge module implements advanced merging strategies based on event coordinates and properties, supporting various operations including length overlap assessment, Jaccard index calculation, and set operations (union, intersection, specific). This module handles different SV types (DEL/DUP/INV, TRA, INS) with specific coordinate matching criteria (δstart, δend thresholds). Additional modules provide benchmarking, format conversion, and visualization capabilities. The output includes merged results in a customized VCF format and comprehensive visualization options (UpSet plots, SV size distribution, SV type distribution, and chromosome distribution). The customized VCF format serves as an intermediate representation that facilitates integration between modules. Additionally, OctopusV generates interactive HTML outputs that allow users to visualize and explore SV data through a web browser interface.
Article Snippet: The input layer handles variant call format (VCF) files from multiple sequencing platforms, including next-generation sequencing (NGS), Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT), and commonly used SV callers, such as Manta, LUMPY, SvABA, DELLY [ ], PBSV [ ], SVIM [ ], Sniffles [ ], CuteSV [ ], SVDSS [ ], and DeBreak [ ], converting them into a standardized format for analysis.
Techniques: Sequencing, Extraction